Authors
- Ce Hao*
- Catherine Weaver*
- Chen Tang*
- Kenta Kawamoto
- Masayoshi Tomizuka*
- Wei Zhan*
* External authors
Venue
- RAL 2024
Date
- 2024
Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning
Ce Hao*
Catherine Weaver*
Chen Tang*
Masayoshi Tomizuka*
Wei Zhan*
* External authors
RAL 2024
2024
Abstract
Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.
Related Publications
Policies learned through Reinforcement Learning (RL) and ImitationLearning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, itis often necessary to further customize a trained pol…
Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Tu…
Autonomous racing poses a significant challenge for control, requiring planning minimum-time trajectories under uncertain dynamics and controlling vehicles at their handling limits. Current methods requiring hand-designed physical models or reward functions specific to each …
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.